-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use installed JEP DLL rather than packaged DLL #50
Conversation
src/main/java/ghidrathon/interpreter/GhidrathonInterpreter.java
Outdated
Show resolved
Hide resolved
src/main/java/ghidrathon/interpreter/GhidrathonInterpreter.java
Outdated
Show resolved
Hide resolved
src/main/java/ghidrathon/interpreter/GhidrathonInterpreter.java
Outdated
Show resolved
Hide resolved
src/main/java/ghidrathon/interpreter/GhidrathonInterpreter.java
Outdated
Show resolved
Hide resolved
|
||
// configure Java names that will be ignored when importing from Python | ||
for (String name : ghidrathonConfig.getJavaExcludeLibs()) { | ||
private PathMatcher getJepDllPathMatcher() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- docstring comments needed for all new routines
if (arch == "amd64") { | ||
// x86 | ||
return FileSystems.getDefault().getPathMatcher("glob:**libjep.so"); | ||
} else if (arch == "arm64") { | ||
// arm m1 | ||
// TODO: just guessing this arch name arm64 | ||
return FileSystems.getDefault().getPathMatcher("glob:**libjep.jnilib"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- testing needed on m1 mac
- testing needed on x86 mac
} else if (os.indexOf("win") >= 0) { | ||
return FileSystems.getDefault().getPathMatcher("glob:**jep.dll"); | ||
} else if (os.indexOf("nux") >= 0) { | ||
return FileSystems.getDefault().getPathMatcher("glob:**libjep.so"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- testing needed on linux
Path pythonPathJep = findPythonPathJep(); | ||
if (pythonPathJep != null) { | ||
log.info("found JEP dll via PYTHONPATH: " + pythonPathJep); | ||
} | ||
|
||
try { | ||
// if this is set, it takes precedence over system python | ||
Path virtualenvJep = findVirtualEnvJep(); | ||
if (virtualenvJep != null) { | ||
log.info("found JEP dll via VIRTUAL_ENV: " + virtualenvJep); | ||
} | ||
|
||
MainInterpreter.setJepLibraryPath(nativeJep.getAbsolutePath()); | ||
// fall back to whatever python3 references | ||
Path systemJep = findSystemJep(); | ||
if (systemJep != null) { | ||
log.info("found JEP dll via python3: " + systemJep); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we collect all the paths first so that we can log out the status fully, then we actually set the config second.
when only VIRTUAL_ENV is present then we'll need to update the python search path. i'm not sure if we can set the PYTHONPATH environment variable from java or we need to shim the interpreter loading to update sys.path. |
if the installation is not working, it would be nice to show a dialog to the user and explain how they can setup the dependencies. adding this doesn't block the merge here, though. |
one rough spot i encountered during development was when JEP crashed, such as when the python interpreter fails to load (missing JEP module), and it brings down all of Ghidra immediately. this was one of the reasons using JNI was discouraged by the Ghidra devs - that bugs make the whole program unreliable versus a single component crashing. to avoid this, we should consider invoking python (and loading JEP) as a subprocess and asserting that it doesn't crash before trying to start the embedded interpreter. if it does crash, we can warn the user and bail safely. |
we should also consider asserting that the jep.dll architecture matches the running java architecture. especially on mac m1 systems, it's easy to mix up formats since the intel emulation via Rosetta is so seamless. not a blocker, but a nice to have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williballenthin this is great! Thank you for taking the time to think this though along with the implementation.
I left a couple of comments for your review. Also, I have some additional thoughts/questions/comments:
1
How are we handling jep.jar
? From Jep's documentation
If you would like to include jep as part of your application, you can place the files as necessary presuming the following conditions are met:
The jep.jar is accessible to the Java classloaders (typically through the Java classpath)
The shared library (jep.so or jep.dll) is accessible by the Java process (typically through -Djava.library.path or the environment variable LD_LIBRARY_PATH)
Are we going to package jep.jar
with the Ghidrathon extension? I don't recall if I tried this in the past, but it would be interesting if this is possible. Jep
makes the jep.jar
available via Maven which could lead to a slick CI build process for the extension. I imagine we ship the Ghidrathon extension with jep.jar
pulled via Maven and then require users to build the shared library on their system, which we then resolve at runtime as you've shown here.
Concerns with this approach include:
- How do we handle version mismatches between
jep.jar
and the shared library? - Can we assume
jep.jar
works across OS and architectures for which users may build the Jep shared library?
2
Are we concerned with the overhead of locating the Jep shared library, especially with using subprocesses? Is this something we should resolve once and store for future use?
|
||
// whoops try Windows | ||
nativeJep = Application.getOSFile(GhidrathonUtils.THIS_EXTENSION_NAME, "jep.dll"); | ||
private void setJepPaths() throws JepException, FileNotFoundException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider using Jep's LibraryLocator? LibraryLocator
should cover PYTHON_HOME
and VIRTUAL_ENV
lending less code for us to maintain. It won't, however, cover your third solution for resolving the shared Jep library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't consider it because i didn't notice it, thank you! i think i can replace most of this PR with a call to that routine 🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into this some more and realized that MainInterpreter
is coded to use the LibraryLocator
if the Jep
library path isn't manually set...https://github.com/ninia/jep/blob/056ce9907f5ecbf2364df1ec55755404b2e8a947/src/main/java/jep/MainInterpreter.java#L124-L135. Essentially, if we don't do anything then Jep
attempts to find the Jep
native library.
I tested this locally in a Linux environment and it worked great with the caveat that the LibraryLocator
is naive in that it chooses the first Jep
native library that it finds e.g. if you have two Python installs, both with Jep
installed, then we don't have a choice (that I can find) as to which is selected. I don't think this is a major roadblock as it, so far, "just works" without additional code on our end and the multiple Python installs can be addressed by the user either by:
- ensuring only 1 Python install also has
Jep
installed - using virtual environments
I'm going to test this on Windows next.
} | ||
|
||
private Path findSystemJep() { | ||
String output = execCmd("python3", "-c", "import sys; import base64; print((b' '.join(map(lambda s: base64.b64encode(s.encode('utf-8')), sys.path))).decode('ascii'))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming python3
is available on all systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. can you give a scenario in which we expect Ghidrathon to work but python3 is not available? i can't immediately think of one, but i dont feel reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't need this with LibraryLocator, so lets not spend a lot of time answering this question unless that other strategy fails.
We can use PyConfig to configure things like |
Was the crash occurring because Ghidrathon did not bail early enough after not being able to locate the Jep module? |
Initially I used this; however, I learned that PYTHONHOME is not the same thing as PYTHONPATH and is basically useless (it doesn't have a meaning on Windows and represents a fragment of a path to the library directory on Unix. its not what i thought). in fact, i don't know why JEP supports this because i can't find a use for this setting. setting PYTHONPATH is probably needed. I think we can prep the new interpreters by running |
I think the crash came from within JEP (java side) as it created the Python embedded interpreter, which looked for JEP (python side) that wasn't accessible (such as if I had PYTHONPATH wrong). then the interpeter raised an exception (python side) and JEP (java side) somehow crashed. im not exactly sure of the details since all of Ghidra would disappear from the screen and no logs are written. wrapping the JEP interpreter creation in try/catch Exception had no effect. in the above scenario, jep.dll is located and invoked; however, the python configuration is incomplete. i think probably the java side tries to reach into the python side, but can't because the python side libraries don't load. ...or something. it might be nice to triage this, and if its a JEP edge case, fix it upstream. in the meantime, i propose trying to detect a correctly install JEP (python side) via subprocess before creating the embedded interpreter. |
i expect the overhead to be a fraction of a second (walk a few file system directories, create a short lived python process), which I think is reasonable. python and java do many of the same operations as they start themselves up, too. we could do a bit better by short circuiting upon finding the first jep.dll (at the expense of less debugging details, no big deal). im quite wary of caching the results because of the complexity involved in detecting changes to the environment. i think we should explore this option if we determine Ghidrathon startup time is too long - but not before then. |
yes. i agree there is then the potential for version mismatching. i'll need to research if JEP verifies the versions or if we need to do this. i hope and expect its the former. in any case, we'll have to ask users to do something like: since jep.jar is available on maven, i suspect that jep.jar should be portable across OS, architectures, and java versions. i'll confirm by using the same Ghidrathon.zip across the different hardware i have available. |
Gotcha' - in the past when Ghidra has crashed a file named along the lines of |
We should also consider situations where multiple Python installs exist on a system where each install includes Jep e.g. a user has Python 3.9 and Python 3.10 installed both w/ Jep version 4.1.1. How do we choose? I think ideally it would be up to the user, but this poses additional challenges. Do we prompt every time or cache the choice? What happens in Ghidra headless mode e.g. Ghidra is configured into an analysis pipeline? |
Unfortunately virtual environments do not work on Windows due to the way Ghidra is invoked. We lose virtual environment env variables when a child process is created for Ghidra e.g. |
Thank you for your contribution - this has been superseded by 32f3969. |
This PR addresses #49 and makes Ghidrathon substantially easier to install: it relies on the jep.dll built during
pip install jep
and therefore doesn't require a Gradle build step by the user. Instead, everyone can use the same pre-built Ghidrathon.zip. An example is attached. In theory it should be a drop-in replacement for whatever Ghidrathon.zip you have already.The PR updates the interpreter configuration to search for the jep.dll in three places:
This works on my Windows system, Ghidra 10.3. I need to test this further, especially on my m1 Mac Mini. We should also setup testing in CI/CD to demonstrate this works on those platforms. Finally, documentation updates are needed if this PR is approved.
TODO:
If merged, this supercedes #4 #46 and #48 and closes #1 #3 #10 #27 #34 and #49
ghidra_10.3_PUBLIC_20230613_Ghidrathon-50.zip